{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Time series classification with Mr-SEQL\n",
    "\n",
    "Mr-SEQL\\[1\\] is a univariate time series classifier which train linear classification models (logistic regression) with features extracted from multiple symbolic representations of time series (SAX, SFA). The features are extracted by using SEQL\\[2\\].\n",
    "\n",
    "\\[1\\] T. L. Nguyen, S. Gsponer, I. Ilie, M. O'reilly and G. Ifrim Interpretable Time Series Classification using Linear Models and Multi-resolution Multi-domain Symbolic Representations in Data Mining and Knowledge Discovery (DAMI), May 2019, https://link.springer.com/article/10.1007/s10618-019-00633-3\n",
    "\n",
    "\\[2\\] G. Ifrim, C. Wiuf\n",
    "“Bounded Coordinate-Descent for Biological Sequence Classification in High Dimensional Predictor Space” (KDD 2011)\n",
    "\n",
    "In this notebook, we will demonstrate how to use Mr-SEQL for univariate time series classification with the ArrowHead dataset.\n",
    "\n",
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:16.702197Z",
     "iopub.status.busy": "2020-12-19T14:32:16.701456Z",
     "iopub.status.idle": "2020-12-19T14:32:17.733638Z",
     "shell.execute_reply": "2020-12-19T14:32:17.734096Z"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn import metrics\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "from sktime.classification.shapelet_based import MrSEQLClassifier\n",
    "from sktime.datasets import load_arrow_head, load_basic_motions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load data\n",
    "For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/main/examples/02_classification_univariate.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:17.738163Z",
     "iopub.status.busy": "2020-12-19T14:32:17.737618Z",
     "iopub.status.idle": "2020-12-19T14:32:17.784112Z",
     "shell.execute_reply": "2020-12-19T14:32:17.784645Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(158, 1) (158,) (53, 1) (53,)\n"
     ]
    }
   ],
   "source": [
    "X, y = load_arrow_head(return_X_y=True)\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)\n",
    "print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train and Test\n",
    "\n",
    "Mr-SEQL can be configured to run in different mode with different symbolic representation.\n",
    "\n",
    "seql_mode can be either 'clf' (SEQL as classifier) or 'fs' (SEQL as feature selection). If 'fs' mode is chosen, a logistic regression classifier will be trained with the features extracted by SEQL.\n",
    "'fs' mode is more accurate in general.\n",
    "\n",
    "symrep can include either 'sax' or 'sfa' or both. Using both usually produces a better result.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:17.788637Z",
     "iopub.status.busy": "2020-12-19T14:32:17.788056Z",
     "iopub.status.idle": "2020-12-19T14:32:32.600131Z",
     "shell.execute_reply": "2020-12-19T14:32:32.600719Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy with mr-seql: 0.887\n"
     ]
    }
   ],
   "source": [
    "# Create mrseql object\n",
    "# use sax by default\n",
    "ms = MrSEQLClassifier(seql_mode=\"clf\")\n",
    "# use sfa representations\n",
    "# ms = MrSEQLClassifier(seql_mode='fs', symrep=['sfa'])\n",
    "# use sax and sfa representations\n",
    "# ms = MrSEQLClassifier(seql_mode='fs', symrep=['sax', 'sfa'])\n",
    "\n",
    "# fit training data\n",
    "ms.fit(X_train, y_train)\n",
    "\n",
    "# prediction\n",
    "predicted = ms.predict(X_test)\n",
    "\n",
    "# Classification accuracy\n",
    "print(\"Accuracy with mr-seql: %2.3f\" % metrics.accuracy_score(y_test, predicted))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train and Test\n",
    "Mr-SEQL also supports multivariate time series. Mr-SEQL extracts features from each dimension of the data independently."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:32.604176Z",
     "iopub.status.busy": "2020-12-19T14:32:32.603714Z",
     "iopub.status.idle": "2020-12-19T14:32:32.667409Z",
     "shell.execute_reply": "2020-12-19T14:32:32.667950Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(60, 6) (60,) (20, 6) (20,)\n"
     ]
    }
   ],
   "source": [
    "X, y = load_basic_motions(return_X_y=True)\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)\n",
    "print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:32.674699Z",
     "iopub.status.busy": "2020-12-19T14:32:32.674233Z",
     "iopub.status.idle": "2020-12-19T14:32:41.396464Z",
     "shell.execute_reply": "2020-12-19T14:32:41.397048Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy with mr-seql: 1.000\n"
     ]
    }
   ],
   "source": [
    "ms = MrSEQLClassifier()\n",
    "\n",
    "# fit training data\n",
    "ms.fit(X_train, y_train)\n",
    "\n",
    "predicted = ms.predict(X_test)\n",
    "\n",
    "# Classification accuracy\n",
    "print(\"Accuracy with mr-seql: %2.3f\" % metrics.accuracy_score(y_test, predicted))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
